Other GLMs

Other generalized linear models

  • Generalized linear models offer a framework for applying something like linear regression in all sorts of different contexts.

  • All GLMs will have the same basic setup: we’re predicting y as a function of some predictors and coefficients that are transformed by a link function:

    • In OLS, the “link” function is the identity, so \(y = X^T\beta\)
    • In logit models, the link function is \(y = \text{logit}(X^T\beta)\)
    • In probit models, the link function is \(y = \Phi(X^T\beta)\)

We’ll talk about a couple other common models that use this same framework.

Counts

Count models can be used to model rates or outcomes that range from 0 to \(\infty\). Some examples might include:

  • Number of battle deaths in a conflict

  • Number of terror attacks in a country

  • Number of protests in a year

  • Number of years between coups (although this may be classified as a survival model)

Poisson Distribution

The simplest way to model a count is with a Poisson distribution: \(y = \text{Poisson}(e^{X^t\beta})\)

  • A poisson distribution has a single parameter, \(\lambda\) , which represents the average rate of occurrence over a fixed time period.

Poisson Distribution

The expected value of \(Y|X\) is

\[ E(Y|X) = e ^{(b_0 + b_1X_1 +b_2X_2...)} \]

poisson_func <- function(x,beta0,beta1) {
  theta <- exp(beta0 + beta1*x)
}
curve(poisson_func(x,0,1),xlim=c(-1,5), lwd=2,ylab="poisson_func function", las=1)
curve(poisson_func(x,-2,1),xlim=c(-1,5), lwd=2, lty=2, col="red", add=TRUE)
curve(poisson_func(x,2,1),xlim=c(-1,5), lwd=2, lty=2, col="blue", add=TRUE)

legend(-1,150,c(expression(paste(beta[0]," = 0")),
                expression(paste(beta[0]," = -2")),
                expression(paste(beta[0]," = 2"))), 
       lwd=2, lty=1:3, col=c("black","red","blue"))

Poisson Distribution

poisson_func <- function(x,beta0,beta1) {
  theta <- exp(beta0 + beta1*x)
}
curve(poisson_func(x,3,-2),xlim=c(-1,5), lwd=2,ylab="poisson_func function", las=1)
curve(poisson_func(x,3,0),xlim=c(-1,5), lwd=2, lty=2, col="blue", add=TRUE)
curve(poisson_func(x,3,2),xlim=c(-1,5), lwd=2, lty=2, col="red", add=TRUE)

legend(3,150,c(expression(paste(beta[1]," = 0")),
               expression(paste(beta[1]," = -2")),
               expression(paste(beta[1]," = 2"))), 
       lwd=2, lty=1:3, col=c("blue","black", "red"))

Poisson model

As with logit and probit models, the link function means that our coefficient values are somewhat unintuitive. Technically, the represent the effect of a one unit change in X on the log count of the dependent variable.

model<-glm(nkill ~e_gdppc ,
           data=gtd_counts, family='poisson')
modelsummary(list("Number killed" = model), 
             coef_rename=T, 
             estimate  = "{estimate}",  
             statistic = c("conf.int"),
             conf_level = .95,     
             note = "Note: 95% CI in brackets",
             gof_omit = 'F|RMSE|R2$',
             title ='Number killed in terrorist attacks in 2018'
             )
tinytable_3s9115npcimw5vavvwgm
Number killed in terrorist attacks in 2018
Number killed
Note: 95% CI in brackets
(Intercept) 0.548
[0.400, 0.692]
GDP per capita ($100,000) 0.602
[0.375, 0.821]
Num.Obs. 176
AIC 1867.1
BIC 1873.5
Log.Lik. -931.564

Poisson model

Exponentiating the coefficients turns this into a multiplicative effect. So the coefficients represent the change in the rate of the dependent variable. i.e. a coefficient of 0.5 means that a one-unit increase in X halves the number of attacks, while a coefficient of 2 would mean doubling the number.

model<-glm(nkill ~ e_gdppc,
           data=gtd_counts, family='poisson')
modelsummary(list("Number killed" = model), 
             coef_rename=T, 
             estimate  = "{estimate}",  
             statistic = c("conf.int"),
             conf_level = .95,     
             note = "Note: 95% CI in brackets",
             gof_omit = 'F|RMSE|R2$',
             title ='Number killed in terrorist attacks in 2018',
             exponentiate =TRUE
             )
tinytable_rvyc1h6r9o0b71dsqlmb
Number killed in terrorist attacks in 2018
Number killed
Note: 95% CI in brackets
(Intercept) 1.730
[1.492, 1.997]
GDP per capita ($100,000) 1.826
[1.455, 2.274]
Num.Obs. 176
AIC 1867.1
BIC 1873.5
Log.Lik. -931.564

Poisson model predictions

For predicted counts, we can simply take the linear prediction from the model and then exponentiate it:

x<-cbind(1, e_gdppc = seq(0, 2, by=.1))
predictions<-  exp(x%*% cbind(coef(model)))
data.frame(x, predictions)|>
  ggplot(aes(x=e_gdppc*100000, y=predictions)) + 
  geom_point() +
  geom_line()+
  theme_bw() +
  labs(x ='GDP Per Capita ($USD)', y='Predicted deaths') +
  scale_x_continuous(labels = scales::label_currency()) 

Poisson Overdispersion

The poisson distribution assumes that the variance is equal to the mean (note that the distribution only has a single parameter: \(\lambda\)). This is rarely true for real-world count data:

  # count of terrorist attacks per country-year
mean(gtd_counts$nkill)
[1] 2.238636
var(gtd_counts$nkill)
[1] 63.0513
data.frame(dist = dist_poisson(lambda =mean(gtd_counts$nkill)))|>
  ggplot(aes(xdist = dist))+
  stat_slab(alpha=.7) +
  stat_slab(data=gtd_counts, aes(x=nkill), 
               fill='lightblue', 
               inherit.aes = FALSE, alpha=.5) +
  theme_bw() +
  labs(title = 'Terrorist attacks in 2018 compared to a poisson distribution with the same mean') 

Poisson Overdispersion

  • Models with more variance than expected are called “overdispersed”, and this causes us to underestimate our standard errors.

    • This can also happen with other distributions, but its probably less common because most distributions allow the variance to be different from the mean.
  • This is may be the result of different data generating processes, i.e.: one pattern for states with terrorist insurgencies, a separate pattern for those without.

    • In some cases there may be a way to fix this with a different set of parameters, but there’s no guarantee.

Options for handling overdispersion

  • Quasipoisson models optimize a different likelihood function that can account for overdispersion.

  • A random effects model with one random effect per observation

  • Hurdle or zero-inflated models (in special cases)

  • A negative binomial model

Negative Binomial Model

  • The negative binomial distribution can be thought of as a way of modeling the number of failures until a specified number of successes. Unlike the poisson distribution, the negative binomial distribution has a mean and a variance, so it can account for overdispersion:
tibble('a' = dist_negative_binomial(size=1, .5),
       'b' = dist_negative_binomial(size=10, .5),
       'c' = dist_negative_binomial(size=100, .5)
       )|>
  ggplot()+
  stat_slab(aes(xdist = a), fill='lightblue', alpha=.7) +
  stat_slab(aes(xdist = b), fill='lightgreen', alpha=.7) +
  stat_slab(aes(xdist = c), fill='orange', alpha=.7) +
  theme_bw() 

Poisson vs. negative binomial

model<-glm(nkill ~ e_gdppc ,
           data=gtd_counts, family='poisson')
nb_model<-glm.nb(nkill ~ e_gdppc , 
                 data=gtd_counts)
tinytable_c11hbd50mh7l2s46vq0g
Number killed in terrorist attacks in 2018
poisson negative binomial
Note: 95% CI in brackets
(Intercept) 0.548 0.508
[0.400, 0.692] [-0.293, 1.411]
GDP per capita ($100,000) 0.602 0.697
[0.375, 0.821] [-0.729, 2.647]
Num.Obs. 176 176
AIC 1867.1 424.6
BIC 1873.5 434.1
Log.Lik. -931.564 -209.297

Poisson vs. negative binomial

The inverse link for the negative binomial and poisson models are the same, so the exponentiated coefficients have the same interpretation as they did for the poisson model:

model_list<-list('poisson' = model,
                 "negative binomial" = nb_model)
modelsummary(model_list,
             coef_rename = TRUE,
             estimate  = "{estimate}",  
             statistic = c("conf.int"),
             conf_level = .95,     
             note = "Note: 95% CI in brackets",
             gof_omit = 'F|RMSE|R2$',
             title ='Number killed in terrorist attacks in 2018. Exponentiated coefficients.',
             exponentiate =TRUE
             )
tinytable_gsjv64m10f5wvoaw1xzd
Number killed in terrorist attacks in 2018. Exponentiated coefficients.
poisson negative binomial
Note: 95% CI in brackets
(Intercept) 1.730 1.662
[1.492, 1.997] [0.746, 4.099]
GDP per capita ($100,000) 1.826 2.008
[1.455, 2.274] [0.482, 14.113]
Num.Obs. 176 176
AIC 1867.1 424.6
BIC 1873.5 434.1
Log.Lik. -931.564 -209.297

Testing Overdispersion

The performance package has a function to check for overdispersion using simulated residuals from a count model. The null hypothesis here is no overdispersion:

library(performance)
check_overdispersion(model)
# Overdispersion test

       dispersion ratio =   27.681
  Pearson's Chi-Squared = 4816.528
                p-value =  < 0.001

We can also see this is addressed by the negative binomial model:

check_overdispersion(nb_model)
# Overdispersion test

 dispersion ratio = 0.740
          p-value = 0.872

Model fit

Since the only difference here is the addition of a single dispersion parameter, we can treat these two models as nested and use a likelihood ratio test to determine with the negative binomial model is a better fit. Unsurprisingly, it is:

pchisq(2 * (logLik(nb_model) - logLik(model)), df = 1, lower.tail = FALSE)
'log Lik.' 4.416128e-316 (df=3)

Marginal Effects

mfx<-avg_comparisons(nb_model)

modelsummary(mfx, 
             coef_rename = TRUE,
             estimate  = "{estimate}",  
             statistic = c("conf.int"),
             conf_level = .95,     
             note = "Note: 95% CI in brackets",
             gof_omit = 'F|RMSE|R2$',
             title ='Number killed per attack. Exponentiated coefficients.',
             )
tinytable_ex9k5gpkhwef4rt7m30q
Number killed per attack. Exponentiated coefficients.
(1)
Note: 95% CI in brackets
e_gdppc 2.270
[-4.931, 9.472]
Num.Obs. 176
AIC 424.6
BIC 434.1
Log.Lik. -209.297

Using a Random Effect

A random effects model also provides and alternative method to account for overdispersion by adding a single random effect term for every case in the model:

library(lme4)


mmodel<-glmer( nkill ~ 
                e_gdppc +
                (1|country_txt), data=gtd_counts, family='poisson')



performance::check_overdispersion(mmodel)
# Overdispersion test

       dispersion ratio = 0.005
  Pearson's Chi-Squared = 0.943
                p-value =     1

Count models with an offset

Poisson and negative binomial models can also be used to model ratio. For instance, we might want to model the number of deaths per attack rather than just the number of deaths in general. To do this, we’ll need to include the logged number of attacks as an “offset” variable in the model. We’ll also have to remove cases with zero attacks, since it doesn’t make sense to measure a rate where the denominator is zero:

deaths_per_attack<-gtd_counts|>
  filter(n > 0)|>
  glm.nb(nkill ~ e_gdppc  + offset(log(n)),
           data=_)


modelsummary(deaths_per_attack, 
             coef_rename = TRUE,
             estimate  = "{estimate}",  
             statistic = c("conf.int"),
             conf_level = .95,     
             note = "Note: 95% CI in brackets",
             gof_omit = 'F|RMSE|R2$',
             title ='Number killed per attack. Exponentiated coefficients.',
             exponentiate =TRUE
             )
tinytable_xwtlbfndxf09ywalf9zy
Number killed per attack. Exponentiated coefficients.
(1)
Note: 95% CI in brackets
(Intercept) 0.554
[0.251, 1.404]
GDP per capita ($100,000) 0.673
[0.146, 4.591]
Num.Obs. 106
AIC 389.3
BIC 397.3
Log.Lik. -191.671

Considerations

  • For count data, a poisson model is a starting point, but over-dispersion is so common in real-world data that you should never assume it and you should be a bit skeptical if you a poisson with no discussion of the problem

Multiple categories

  • Logit/probit models work for a single dichotomous outcome, but how do we model cases with more than two possibilities?

Ordered logits

Ordered logit/probit models can be used to model the effect of a one unit increase in some ordered categorical outcome, such as an agree/disagree scale.

One way to think about these models is that they work like the latent data-formulation of the logit model, but there are multiple thresholds that may be unequally spaced:

Our goal here would be to estimate both the effect of X on the latent outcome Y* and the thresholds for each of the response categories.

Ordered logits

In a sense, you could think of this as a series of K-1 logistic regressions for each level of the ordered category:

\[ \begin{align*} Pr(Y > 1) & = \text{logit}^-1(X\beta) \\ Pr(Y > 2) & = \text{logit}^-1(X\beta- \text{cut}_2) \\ Pr(Y > 3) & = \text{logit}^-1(X\beta -\text{cut}_3) \\ ...\\ Pr(Y > K-1) & = \text{logit}^-1(X\beta - \text{cut}_{K-1}) \\ \end{align*} \]

Ordered logits

set.seed(1000)
N<-1000
x<-rnorm(N)
ystar <- x * 3

cutpoints<-c(-5,  0, 9)
ycut<-cut(ystar + rlogis(N), c(-Inf,-9, -5, 0,  6, Inf),
          labels=c("lowest","lower", "medium", "higher", "highest")
          ) 

df<-data.frame(x, ystar, ycut)
MASS::polr(ycut ~ x, data=df, method='logistic')
Call:
MASS::polr(formula = ycut ~ x, data = df, method = "logistic")

Coefficients:
       x 
2.961754 

Intercepts:
  lowest|lower   lower|medium  medium|higher higher|highest 
    -8.9015292     -4.7899480      0.1010896      5.7852156 

Residual Deviance: 1255.273 
AIC: 1265.273 

The terms labelled as intercepts here correspond to the cutpoints in my simulated latent variable, and the coefficient on X is just the effect of x on ystar

Ordered logits

To get predictions for individual models, we’ll need to use the inverse of the logit function and calculate

\[ Pr(y = k) = \text{logit}^{-1}(X\beta-c_{K-1}) - \text{logit}^{-1}(X\beta-c_k) \]

Ordered logits: predictions

We’ll get the coefficient values and the cutpoints from the model:

invlogit<-function(x){
  return(exp(x)/(1+exp(x))) 
  
}
ologit<-MASS::polr(ycut ~ x, data=df, method='logistic')
# coefficients
coefs<-ologit$coefficients
# intercepts
cutpoints<-ologit$zeta

Ordered logits: predictions

Now we’ll calculate individual response probabilities:

# Pr(k = 1) if X =2
invlogit(cutpoints[1] - 2 * coefs)
lowest|lower 
3.643911e-07 
# Pr(k = 2) if X =2
invlogit(cutpoints[2] - 2 * coefs) - invlogit(cutpoints[1] - 2 * coefs)
lower|medium 
2.187871e-05 
# Pr(k = 3) if X =2
invlogit(cutpoints[3] - 2 * coefs) - invlogit(cutpoints[2] - 2 * coefs)
medium|higher 
  0.002929454 
# Pr(k = 4) if X =2
invlogit(cutpoints[4] - 2 * coefs) - invlogit(cutpoints[3] - 2 * coefs)
higher|highest 
       0.46253 
# Pr(k = 5) if X =2
1 - invlogit(cutpoints[4] - 2 * coefs)
higher|highest 
     0.5345183 

Ordered logits

Of course, we can also just use predict to get these same results:

predict(ologit, type='probs', newdata=data.frame(x=3))
      lowest        lower       medium       higher      highest 
1.884926e-08 1.131768e-06 1.519636e-04 4.295219e-02 9.568947e-01 

Ordered logits

Or we can use the ggeffects package to get predictions across all values of x and all levels of the outcome:

library(ggeffects)

result<-predict_response(ologit)

plot(result)

Ordered logits

Here’s a more practical example using European Social Survey data for Great Britain. The dependent variable is whether richer countries have an obligation to take immigrants, and the predictors are whether the respondent has been unemployed in the last 5 years, and their self placement on a left/right scale:

ologit<-MASS::polr(richer_responsible ~ uemp5yr + lrscale, data=gb, 
                   method='logistic')
ologit
Call:
MASS::polr(formula = richer_responsible ~ uemp5yr + lrscale, 
    data = gb, method = "logistic")

Coefficients:
   uemp5yrYes       lrscale 
 0.0007202287 -0.1513971718 

Intercepts:
         Disagree strongly|Disagree Disagree|Neither agree nor disagree 
                          -4.077814                           -1.316149 
   Neither agree nor disagree|Agree                Agree|Agree strongly 
                          -0.553276                            2.047397 

Residual Deviance: 1059.459 
AIC: 1071.459 

Ordered logits

preds<-predict_response(ologit)
pp<-plot(preds)
pp$uemp5yr
pp$lrscale

Ordered logits

And we can get estimated average marginal effects from the model as well. Note that we have 5 different margins for each coefficient, one for each category of the DV:

library(marginaleffects)
mfx<-avg_comparisons(ologit)

mfx
library(marginaleffects)
avg_comparisons(ologit)|>
  dplyr::select(term, group, contrast, estimate, conf.low, conf.high)|>
  mutate(across(where(is.numeric), .fns=~round(.x, digits=4)))

    Term                      Group Contrast Estimate   2.5 %  97.5 %
 lrscale Disagree strongly          +1         0.0055  0.0002  0.0107
 lrscale Disagree                   +1         0.0296  0.0075  0.0516
 lrscale Neither agree nor disagree +1         0.0017 -0.0008  0.0041
 lrscale Agree                      +1        -0.0287 -0.0500 -0.0075
 lrscale Agree strongly             +1        -0.0080 -0.0144 -0.0015
 uemp5yr Disagree strongly          Yes - No   0.0000 -0.0122  0.0122
 uemp5yr Disagree                   Yes - No  -0.0001 -0.0704  0.0701
 uemp5yr Neither agree nor disagree Yes - No   0.0000 -0.0064  0.0063
 uemp5yr Agree                      Yes - No   0.0001 -0.0682  0.0685
 uemp5yr Agree strongly             Yes - No   0.0000 -0.0204  0.0205

Ordered logits: considerations

  • Ordered logits make the most sense when you have a relatively small number of ordered categories and you think the differences between the responses are “unevenly spaced” with respect to your IVs

  • If you have a lot of response categories (7 or more) a linear model if generally a reasonable alternative.

  • Even in cases with a smaller number of categories, the juice may not be worth the squeeze, but an ordered logit might be worth exploring if you’re unsatisfied with your results.

Other models

  • Multinomial logit: responses for multiple discrete categories
    • K-1 coefficients for each category of the DV.
    • Coefficients are the effect on the logged odds of being in a category relative to the baseline group.